101 research outputs found

    Symbiosis between the TRECVid benchmark and video libraries at the Netherlands Institute for Sound and Vision

    Get PDF
    Audiovisual archives are investing in large-scale digitisation efforts of their analogue holdings and, in parallel, ingesting an ever-increasing amount of born- digital files in their digital storage facilities. Digitisation opens up new access paradigms and boosted re-use of audiovisual content. Query-log analyses show the shortcomings of manual annotation, therefore archives are complementing these annotations by developing novel search engines that automatically extract information from both audio and the visual tracks. Over the past few years, the TRECVid benchmark has developed a novel relationship with the Netherlands Institute of Sound and Vision (NISV) which goes beyond the NISV just providing data and use cases to TRECVid. Prototype and demonstrator systems developed as part of TRECVid are set to become a key driver in improving the quality of search engines at the NISV and will ultimately help other audiovisual archives to offer more efficient and more fine-grained access to their collections. This paper reports the experiences of NISV in leveraging the activities of the TRECVid benchmark

    Learning Semantic Query Suggestions

    Get PDF

    Learning Semantic Query Suggestions

    Get PDF

    Probabilistic Reuse of Past Search Results

    Get PDF
    International audienceIn this paper, a new Monte Carlo algorithm to improve precision of information retrieval by using past search results is presented. Experiments were carried out to compare the proposed algorithm with traditional retrieval on a simulated dataset. In this dataset, documents, queries, and judgments of users were simulated. Exponential and Zipf distributions were used to build document collections. Uniform distribution was applied to build the queries. Zeta distribution was utilized to simulate the Bradford’s law representing the judgments of users. Empirical results show a better performance of our algorithm compared with traditional retrieval

    Identifying Audiences of E-Infrastructures - Tools for Measuring Impact

    Get PDF
    Research evaluation should take into account the intended scholarly and non-scholarly audiences of the research output. This holds too for research infrastructures, which often aim at serving a large variety of audiences. With research and research infrastructures moving to the web, new possibilities are emerging for evaluation metrics. This paper proposes a feasible indicator for measuring the scope of audiences who use web-based e-infrastructures, as well as the frequency of use. In order to apply this indicator, a method is needed for classifying visitors to e-infrastructures into relevant user categories. The paper proposes such a method, based on an inductive logic program and a Bayesian classifier. The method is tested, showing that the visitors are efficiently classified with 90% accuracy into the selected categories. Consequently, the method can be used to evaluate the use of the e-infrastructure within and outside academia

    HXE 108 - APPROACHES TO ENGLISH LITERATURE OCT 04.

    Get PDF
    Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search. We use our pseudo test collections in two ways. First, we tune parameters of a variety of well known retrieval methods on them. Correlations with parameter sweeps on an editorial test collection are high on average, with a large variance over retrieval algorithms. Second, we use the pseudo test collections as training sets in a learning to rank scenario. Performance close to training on an editorial test collection is achieved in many cases. Our results demonstrate the utility of tuning and training microblog search algorithms on automatically generated training material

    Probabilistic Reuse of Past Search Results

    Get PDF
    In this paper, a new Monte Carlo algorithm to improve precision of information retrieval by using past search results is presented. Experiments were carried out to compare the proposed algorithm with traditional retrieval on a simulated dataset. In this dataset, documents, queries, and judgments of users were simulated. Exponential and Zipf distributions were used to build document collections. Uniform distribution was applied to build the queries. Zeta distribution was utilized to simulate the Bradford’s law representing the judgments of users. Empirical results show a better performance of our algorithm compared with traditional retrieval
    corecore